{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.7"
    },
    "colab": {
      "name": "Augmentation with TextAttack.ipynb",
      "provenance": []
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "m83IiqVREJ96"
      },
      "source": [
        "# TextAttack Augmentation"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6UZ0d84hEJ98"
      },
      "source": [
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/QData/TextAttack/blob/master/docs/2notebook/3_Augmentations.ipynb)\n",
        "\n",
        "[![View Source on GitHub](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/QData/TextAttack/blob/master/docs/2notebook/3_Augmentations.ipynb)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qZ5xnoevEJ99"
      },
      "source": [
        "Augmenting a dataset using TextAttack requries only a few lines of code when it is done right. The `Augmenter` class is created for this purpose to generate augmentations of a string or a list of strings. Augmentation could be done in either python script or command line.\n",
        "\n",
        "### Creating an Augmenter\n",
        "\n",
        "The **Augmenter** class is essensial for performing data augmentation using TextAttack. It takes in four paramerters in the following order:\n",
        "\n",
        "\n",
        "1.  **transformation**: all [transformations](https://textattack.readthedocs.io/en/latest/apidoc/textattack.transformations.html) implemented by TextAttack can be used to create an `Augmenter`. Note here that if we want to apply multiple transformations in the same time, they first need to be incooporated into a `CompositeTransformation` class.\n",
        "2.  **constraints**: [constraints](https://textattack.readthedocs.io/en/latest/apidoc/textattack.constraints.html#) determine whether or not a given augmentation is valid, consequently enhancing the quality of the augmentations. The default augmenter does not have any constraints but contraints can be supplied as a list to the Augmenter.\n",
        "3.  **pct_words_to_swap**:  percentage of words to swap per augmented example. The default is set to 0.1 (10%).\n",
        "4.  **transformations_per_example** maximum number of augmentations per input. The default is set to 1 (one augmented sentence given one original input)\n",
        "\n",
        "An example of creating one's own augmenter is shown below. In this case, we are creating an augmenter with **RandomCharacterDeletion** and **WordSwapQWERTY** transformations, **RepeatModification** and **StopWordModification** constraints. A maximum of **50%** of the words could be purturbed, and 10 augmentations will be generated from each input sentence.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "5AXyxiLD4X93"
      },
      "source": [
        "!pip install textattack\n",
        "!pip install torch==1.6\n",
        "\n",
        "# import transformations, contraints, and the Augmenter\n",
        "from textattack.transformations import WordSwapRandomCharacterDeletion\n",
        "from textattack.transformations import WordSwapQWERTY\n",
        "from textattack.transformations import CompositeTransformation\n",
        "\n",
        "from textattack.constraints.pre_transformation import RepeatModification\n",
        "from textattack.constraints.pre_transformation import StopwordModification\n",
        "\n",
        "from textattack.augmentation import Augmenter"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "wFeXF_OL-vyw",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "c041e77e-accd-4a58-88be-9b140dd0cd56"
      },
      "source": [
        "# Set up transformation using CompositeTransformation()\n",
        "transformation = CompositeTransformation([WordSwapRandomCharacterDeletion(), WordSwapQWERTY()])\n",
        "# Set up constraints\n",
        "constraints = [RepeatModification(), StopwordModification()]\n",
        "# Create augmenter with specified parameters\n",
        "augmenter = Augmenter(transformation=transformation, constraints=constraints, pct_words_to_swap=0.5, transformations_per_example=10)\n",
        "s = 'What I cannot create, I do not understand.'\n",
        "# Augment!\n",
        "augmenter.augment(s)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "['Wat I cannog crexte, I do not ubderstand.',\n",
              " 'Wat I cnnot creae, I do not understanr.',\n",
              " 'Wha I canno ceeate, I do not ubderstand.',\n",
              " 'Whaf I camnot creatr, I do not understsnd.',\n",
              " 'Wht I cannt crete, I do not undrstand.',\n",
              " 'Wht I cnnot crewte, I do not undersyand.',\n",
              " 'Whzt I cannlt creare, I do not understajd.',\n",
              " 'Wuat I cannof cfeate, I do not undefstand.',\n",
              " 'Wuat I cannoy ceate, I do not ubderstand.',\n",
              " 'hat I annot cfeate, I do not undestand.']"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 19
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b7020KtvEJ9-"
      },
      "source": [
        "### Pre-built Augmentation Recipes\n",
        "\n",
        "In addition to creating our own augmenter, we could also use pre-built augmentation recipes to perturb datasets. These recipes are implemented from publishded papers and are very convenient to use. The list of available recipes can be found [here](https://textattack.readthedocs.io/en/latest/3recipes/augmenter_recipes.html).\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pkBqK5wYQKZu"
      },
      "source": [
        "In the following example, we will use the `CheckListAugmenter` to showcase our augmentation recipes. The `CheckListAugmenter` augments words by using the transformation methods provided by CheckList INV testing, which combines **Name Replacement**, **Location Replacement**, **Number Alteration**, and **Contraction/Extension**. The original paper can be found here: [\"Beyond Accuracy: Behavioral Testing of NLP models with CheckList\" (Ribeiro et al., 2020)](https://arxiv.org/abs/2005.04118)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "WkYiVH6lQedu",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "cd5ffc65-ca80-45cd-b3bb-d023bcad09a4"
      },
      "source": [
        "# import the CheckListAugmenter\n",
        "from textattack.augmentation import CheckListAugmenter\n",
        "# Alter default values if desired\n",
        "augmenter = CheckListAugmenter(pct_words_to_swap=0.2, transformations_per_example=5)\n",
        "s = \"I'd love to go to Japan but the tickets are 500 dollars\"\n",
        "# Augment\n",
        "augmenter.augment(s)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "['I would love to go to Central African Republic but the tickets are 500 dollars',\n",
              " 'I would love to go to Japan but the tickets are 707 dollars',\n",
              " 'I would love to go to Kosovo but the tickets are 500 dollars',\n",
              " \"I'd love to go to Dominica but the tickets are 437 dollars\",\n",
              " \"I'd love to go to New Caledonia but the tickets are 697 dollars\"]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 5
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5vn22xrLST0H"
      },
      "source": [
        "Note that the previous snippet of code is equivalent of running\n",
        "\n",
        "```\n",
        "textattack augment --recipe checklist --pct-words-to-swap .1 --transformations-per-example 5 --exclude-original --interactive\n",
        "```\n",
        "in command line.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VqfmCKz0XY-Y"
      },
      "source": [
        "\n",
        "\n",
        "\n",
        "Here's another example of using `WordNetAugmenter`:\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "l2b-4scuXvkA",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "72a78a95-ffc0-4d2a-b98c-b456d338807d"
      },
      "source": [
        "from textattack.augmentation import WordNetAugmenter\n",
        "augmenter = WordNetAugmenter(pct_words_to_swap=0.2, transformations_per_example=5)\n",
        "s = \"I'd love to go to Japan but the tickets are 500 dollars\"\n",
        "augmenter.augment(s)"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[\"I'd hump to go to Japan but the slate are 500 dollars\",\n",
              " \"I'd love to go to Nippon but the tickets are 500 buck\",\n",
              " \"I'd love to go to japan but the tickets are 500 dollar\",\n",
              " \"I'd love to perish to Japan but the fine are 500 dollars\",\n",
              " \"I'd love to start to Japan but the tickets are 500 buck\"]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 7
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "collapsed": true,
        "id": "whvwbHLVEJ-S"
      },
      "source": [
        "### Conclusion\n",
        "We have now went through the basics in running `Augmenter` by either creating a new augmenter from scratch or using a pre-built augmenter. This could be done in as few as 4 lines of code so please give it a try if you haven't already! 🐙"
      ]
    }
  ]
}